Programmatic Root Cause Analysis For Application Performance Management

ABSTRACT

Programmatic root cause analysis of application performance problems is provided in accordance with various embodiments. Transactions having multiple components can be monitored to determine if they are exceeding a threshold for their execution time. Monitoring the transactions can include instrumenting one or more applications to gather component level information. For transactions exceeding a threshold, the data collected for the individual components can be analyzed to automatically diagnose the potential cause of the performance problem. Time-series analytical techniques are employed to determine normal values for transaction and component execution times. The values can be dynamic or static. Deviations from these normal values can be detected and reported as a possible cause. Other filters in addition to or in place of execution times for transactions and components can also be used.

BACKGROUND OF THE INVENTION

1. Field of the Invention

Embodiments of the present disclosure are directed to applicationperformance management.

2. Description of the Related Art

Maintaining and improving application performance is an integral part ofsuccess for many of today's institutions. Businesses and other entitiesprogressively rely on increased numbers of software applications for dayto day operations. Consider a business having a presence on the WorldWide Web. Typically, such a business will provide one or more web sitesthat run one or more web-based applications. A disadvantage ofconducting business via the Internet in this manner is the reliance onsoftware and hardware infrastructures for handling businesstransactions. If a web site goes down, becomes unresponsive or otherwisefails to properly serve customers, the business may lose potential salesand/or customers. Intranets and Extranets pose similar concerns forthese businesses. Thus, there exists a need to monitor web-based, andother applications, to ensure they are performing properly or accordingto expectation.

For many application developers, a particular area of concern in thesetypes of environments is transaction time. Longer transaction times maycorrelate directly to fewer transactions and thus, lost sales, etc. Itmay be expected that a particular task that forms part of a type oftransaction may take a fraction of a second to complete its function(s).The task may execute for longer than expected for one or moretransactions due to a problem somewhere in the system. Slowly executingtasks can degrade a site's performance, degrade application performance,and consequently, cause failure of the site or application.

Accordingly, developers seek to debug software when an application ortransaction is performing poorly to determine what part of the code iscausing the performance problem. While it may be relatively easy todetect when an application is performing slowly because of slow responsetimes or longer transaction times, it is often difficult to diagnosewhich portion of the software is responsible for the degradedperformance. Typically, developers must manually diagnose portions ofthe code based on manual observations. Even if a developer successfullydetermines which method, function, routine, process, etc. is executingwhen an issue occurs, it is often difficult to determine whether theproblem lies with the identified method, etc., or whether the problemlies with another method, function, routine, process, etc. that iscalled by the identified method. Furthermore, it is often not apparentwhat is a typical or appropriate execution time for a portion of anapplication or transaction. Thus, even with information regarding thetime associated with a piece of code, the developer may not be able todetermine whether the execution time is indicative of a performanceproblem or not.

SUMMARY OF THE INVENTION

Programmatic root cause analysis of application performance problems isprovided in accordance with various embodiments. Transactions havingmultiple components can be monitored to determine if they are exceedinga threshold for their execution time. Monitoring the transactions caninclude instrumenting one or more applications to gather component levelinformation. For transactions exceeding a threshold, the data collectedfor the individual components can be analyzed to automatically diagnosethe potential cause of the performance problem. Time-series analyticaltechniques are employed to determine normal values for transaction andcomponent execution times. The values can be dynamic or static.Deviations from these normal values can be detected and reported as apossible cause. Other filters in addition to or in place of executiontimes for transactions and components can also be used.

In one embodiment, a method of processing data is provided that includescollecting data about a set of transactions that each include aplurality of components associated with a plurality of tasks. The dataincludes time series data for each task based on execution times ofcomponents associated with the task during the set of transactions. Themethod further includes determining whether the transactions haveexecution times exceeding a threshold and for each transaction having anexecution time exceeding the threshold, identifying one or morecomponents based on a deviation in time series data for a task that isassociated with the one or more components of each transaction, andreporting said one or more components for said each transaction.

One embodiment includes an apparatus for monitoring software thatincludes one or more agents and a manager in communication with theagents. The agents collect data about a set of transactions that eachinclude a plurality of components associated with a plurality ofsystems. The manager performs a method including receiving the dataabout the set of transactions from the one or more agents and developingtime series data for each of the systems based on execution times ofcomponents associated with each system during the set of transactions.For each transaction having an execution time beyond a threshold, themanager identifies one or more components based on a deviation in timeseries data for a system that is associated with the one or morecomponents of each transaction, and reports the one or more componentsfor each transaction.

Embodiments in accordance with the present disclosure can beaccomplished using hardware, software or a combination of both hardwareand software. The software can be stored on one or more processorreadable storage devices such as hard disk drives, CD-ROMs, DVDs,optical disks, floppy disks, tape drives, RAM, ROM, flash memory orother suitable storage device(s). In alternative embodiments, some orall of the software can be replaced by dedicated hardware includingcustom integrated circuits, gate arrays, FPGAs, PLDs, and specialpurpose processors. In one embodiment, software (stored on a storagedevice) implementing one or more embodiments is used to program one ormore processors. The one or more processors can be in communication withone or more storage devices, peripherals and/or communicationinterfaces.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a block diagram of a system including a tool for monitoring anapplication in accordance with one embodiment.

FIG. 2 is a block diagram depicting the instrumentation of byte code bya probe builder in accordance with one embodiment.

FIG. 3 is a flowchart of a process for tracing transactions inaccordance with one embodiment using the system of FIG. 1.

FIG. 4 is a flowchart of a process for starting the tracing oftransactions in accordance with one embodiment.

FIG. 5 is a flowchart of a process for concluding the tracing oftransactions in accordance with one embodiment.

FIG. 6 depicts a graphical user interface in accordance with oneembodiment.

FIG. 7 depicts a portion of the graphical user interface of FIG. 6.

FIG. 8 is a table depicting exemplary component data for a plurality oftransactions collected by an enterprise manager in accordance with oneembodiment.

FIG. 9 is a table depicting exemplary component data for a plurality oftransactions collected by an enterprise manager in accordance with oneembodiment.

FIG. 10 is a table depicting exemplary component data for a plurality oftransactions collected by an enterprise manager in accordance with oneembodiment.

FIG. 11 is a flowchart of a process for collecting and reporting dataabout transactions in accordance with one embodiment.

FIG. 12 is a flowchart of a process for dynamically updating thresholdand/or normal execution time data for a type of transaction inaccordance with one embodiment.

FIG. 13 is a flowchart of a process for dynamically updating thresholdand/or normal execution time data for a type of component (task) inaccordance with one embodiment.

FIG. 14 is a flowchart of a process in accordance with one embodimentfor reporting data in the transaction trace table of the graphical userinterface depicted in FIG. 6.

FIG. 15 is a flowchart of a process for displaying a transaction snapshot in accordance with one embodiment.

FIG. 16 is a flowchart of a process for drawing a view for a componentin accordance with one embodiment.

FIG. 17 is a flowchart of a process for reporting detailed informationabout a component of a transaction in accordance with one embodiment.

DETAILED DESCRIPTION

Programmatic root cause analysis of performance problems for applicationperformance management is provided in accordance with embodiments of thepresent disclosure. Transactions are traced and one or more componentsof a transaction that are executing too slowly or otherwise causing aperformance problem are reported. A transaction is traced to determinewhether its execution time is beyond a threshold. If a transaction has aroot level execution time outside a threshold, it can be reported.Tracing the transaction includes collecting information regarding theexecution times of individual components of the transaction. Forreported transactions, one or more components of the transaction can beidentified and reported as a potential cause of the slow execution timefor the transaction. If a particular component has an execution timebeyond a threshold for a task or system associated with the component,the component can be identified and reported.

Component data is collected when tracing a set of transactions of aparticular type. This component data can be organized into time seriesdata for a particular type of component. For example, time series datacan be formulated using the execution time of related components ofmultiple transactions. Related components can include a component fromeach transaction that is responsible for executing a particular task.The execution time of these components can be organized into time seriesdata for the particular task. The data can also be organized by thesystem associated with or on which each of the components execute. If aparticular transaction is performing abnormally, each of its componentscan be examined. Each component's execution time can be compared to athreshold based on a normal execution time for the task or systemassociated with that component. If a component's execution time isoutside a normal time for the task it performs or the system with whichit is associated, the component can be reported as a potential cause ofthe transaction's performance problem.

In one embodiment, a graphical user interface is used to reporttransactions and components that exceed a threshold. For each reportedtransaction, a visualization can be provided that enables a user toimmediately understand where time was spent in a traced transaction. Thevisualization can identify select components of the reported transactionas a potential cause of a transaction's performance problem, by virtueof having an execution time beyond a threshold. The component thresholdsmay take the form of threshold deviations from a normal value orthreshold execution times.

In one embodiment of the present disclosure, methods, etc. in a JAVAenvironment are monitored. In such an embodiment, a transaction may be amethod invocation in a running software system that enters the JAVAvirtual machine (JVM) and exits the JVM (and all that it calls). Asystem in accordance with embodiments as hereinafter described caninitiate transaction tracing on one, some or all transactions managed bythe system. Although embodiments are principally disclosed using JAVAimplementation examples, the disclosed technology is not so limited andmay be used in and with other programming languages, paradigms, systemsand/or environments.

In one embodiment, an application performance management tool isprovided that implements the performance analysis described herein. FIG.1 provides a conceptual view of one such implementation. The toolincludes an enterprise manager 120, database 122, workstation 124, andworkstation 126. FIG. 1 also depicts a managed application 6 containingprobe 102, probe 104, and agent 8. As the managed application runs, theprobes relay data to agent 8. Agent 8 collects, summarizes, and sendsthe data to enterprise manager 120.

Enterprise manager 120 receives performance data from managedapplications via agent 8, runs requested calculations, makes performancedata available to workstations 124, 126 and optionally sends performancedata to database 122 for later analysis. The workstations include thegraphical user interface for viewing performance data. The workstationsare used to create custom views of performance data which can bemonitored by a human operator. In one embodiment, the workstationsconsist of two main windows: a console and an explorer. The consoledisplays performance data in a set of customizable views. The explorerdepicts alerts and calculators that filter performance data so that thedata can be viewed in a meaningful way. The elements of the workstationthat organize, manipulate, filter and display performance data includeactions, alerts, calculators, dashboards, persistent collections, metricgroupings, comparisons, smart triggers and SNMP collections.

In one embodiment of FIG. 1, each component runs on a different machine.For example, workstation 126 is on a first computing device, workstation124 is on a second computing device, enterprise manager 120 is on athird computing device, and managed application 6 is on a fourthcomputing device. In another embodiment, two or more (or all) of thecomponents are operating on the same computing device. For example,managed application 6 and agent 8 may be on a first computing device,enterprise manager 120 on a second computing device and a workstation ona third computing device. Any or all of these computing devices can beany of various different types of computing devices, including personalcomputers, minicomputers, mainframes, servers, handheld computingdevices, mobile computing devices, etc. Typically, these computingdevices will include one or more processors in communication with one ormore processor readable storage devices, communication interfaces,peripheral devices, etc. Examples of the storage devices include RAM,ROM, hard disk drives, floppy disk drives, CD ROMS, DVDs, flash memory,etc. Examples of peripherals include printers, monitors, keyboards,pointing devices, etc. Examples of communication interfaces includenetwork cards, modems, wireless transmitters/receivers, etc. The systemrunning the managed application can include a web server/applicationserver. The system running the managed application may also be part of anetwork, including a LAN, a WAN, the Internet, etc. In some embodiments,all or part of the disclosed technology is implemented in software thatis stored on one or more processor readable storage devices and is usedto program one or more processors.

In one embodiment, an application performance management tool monitorsperformance of an application by accessing the application's source codeand modifying that source code. In some instances, however, the sourcecode may not be available to the application performance managementtool. Accordingly, another embodiment monitors performance of anapplication without requiring access to or modification of theapplication's source code. Rather, the tool can instrument theapplication's object code (also called bytecode).

FIG. 2 depicts an exemplary process for modifying an application'sbytecode to create managed application 6. FIG. 1 includes application 2,probe builder 4, application 6 and agent 8. Application 6 includesprobes, which will be discussed in more detail below. Application 2 isthe Java application before the probes are added. In embodiments thatuse a programming language other than Java, application 2 can be adifferent type of application.

Probe Builder 4 instruments (e.g. modifies) the bytecode for application2 to add probes and additional code to application 2 in order to createapplication 6. The probes measure specific pieces of information aboutthe application without changing the application's business logic. Probebuilder 4 also installs agent 8 on the same machine as application 6.Once the probes have been installed in the bytecode, the Javaapplication is referred to as a managed application. More informationabout instrumenting byte code can be found in the following: U.S. Pat.No. 6,260,187, entitled “System For Modifying Object Oriented Code;”U.S. patent application Ser. No. 09/795,901, entitled “AddingFunctionality to Existing Code at Exits;” U.S. patent Ser. No.10/692,250, entitled “Assessing Information at Object Creation;” andU.S. patent application Ser. No. 10/622,022, entitled “Assessing ReturnValues and Exceptions, all of which are incorporated by reference hereinin their entirety.

In accordance with one embodiment, bytecode is instrumented by addingnew code that activates a tracing mechanism when a method starts andterminates the tracing mechanism when the method completes. To betterexplain this concept consider the following exemplary pseudo code for amethod called “exampleMethod.” This method receives an integerparameter, adds 1 to the integer parameter, and returns the sum:

public int exampleMethod(int x)   {   return x + 1;   }

One embodiment will instrument this code, conceptually, by including acall to a tracer method, grouping the original instructions from themethod in a “try” block, and adding a “finally” block with a code thatstops the tracer:

public int exampleMethod(int x) { IMethodTracer tracer =AMethodTracer.loadTracer( “com.introscope.agenttrace.MethodTimer”, this,“com.wily.example.ExampleApp”, “exampleMethod”, “name=Example Stat”);try { return x + 1; } finally { tracer.finishTrace( ); } }

IMethodTracer is an interface that defines a tracer for profiling.AMethodTracer is an abstract class that implements IMethodTracer.IMethodTracer includes the methods startTrace and finishTrace.AMethodTracer includes the methods startTrace, finishTrace, dostartTraceand dofinishTrace. The method startTrace is called to start a tracer,perform error handling and perform setup for starting the tracer. Theactual tracer is started by the method doStartTrace, which is called bystartTrace. The method finishTrace is called to stop the tracer andperform error handling. The method finishTrace calls doFinishTrace toactually stop the tracer. Within AMethodTracer, startTrace andfinishTracer are final and void methods; and doStartTrace anddoFinishTrace are protected, abstract and void methods. Thus, themethods doStartTrace and doFinishTrace must be implemented in subclassesof AMethodTracer. Each of the subclasses of AMethodTracer implement theactual tracers. The method loadTracer is a static method that callsstartTrace and includes five parameters. The first parameter,“com.introscope . . . ” is the name of the class that is intended to beinstantiated that implements the tracer. The second parameter, “this” isthe object being traced. The third parameter, “com.wily.example . . . ,”is the name of the class of which the current instruction is inside. Thefourth parameter, “exampleMethod,” is the name of the method of whichthe current instruction is inside. The fifth parameter, “name= . . . ”is the name under which the statistics are recorded. The originalinstruction (return x+1) is placed inside a “try” block. The code forstopping the tracer (a call to the static method tracer.finishTrace) isput within the finally block.

The above example shows source code being instrumented. In oneembodiment, source code is not actually modified. Rather, an applicationmanagement tool modifies object code. The source code examples above areused for illustration to explain the concept of instrumentation inaccordance with embodiments. The object code is modified conceptually inthe same manner that source code modifications are explained above. Thatis, the object code is modified to add the functionality of the “try”block and “finally” block. In another embodiment, the source code can bemodified as explained above.

In a typical implementation including an application performancemanagement tool as provided herein, more than one application will bemonitored. The various applications can reside on a single computingdevice or on different computing devices. An agent may be installed foreach managed application or on only a subset of the applications. Eachagent will report back to enterprise manager 120 with data collected forthe application it manages. Agents can also report data for applicationsthat they do not directly manage, such as an application on a differentcomputing device. The agent may collect data by monitoring responsetimes or installing scripts to collect data from a remote application.For example, Javascript inserted into a returned web page can execute todetermine the execution time of a remote application such as a browser.

FIG. 3 is a flowchart describing one embodiment of a process for tracingtransactions using the system of FIG. 1. In step 200, a transactiontrace session is started. In one embodiment of step 200, a window isopened and a user selects a dropdown menu to start a transaction tracesession. In other embodiments, other methods can be used to start thesession. In step 202, a dialog box is presented to the user. This dialogbox will ask the user for various configuration information. In step204, the various configuration information is provided by the usertyping information into the dialogue box. Other means for entering theinformation can also be used within the spirit of the presentdisclosure.

One variable entered by the user in step 204 is the threshold traceperiod. That is, the user enters a time, which could be in seconds,milliseconds, microseconds, etc. The system will only report thosetransactions that have an execution time longer than the thresholdperiod provided. For example, if the threshold is one second, the systemwill only report transactions that are executing for longer than onesecond. In some embodiments, step 204 only includes providing athreshold time period. In other embodiments, other configuration datacan also be provided. For example, the user can identify an agent, a setof agents, or all agents. In such an embodiment, only identified agentswill perform the transaction tracing described herein. In anotherembodiment, enterprise manager 120 will determine which agents to use.

Another configuration variable that can be provided is the sessionlength. The session length indicates how long the system will performthe tracing. For example, if the session length is ten minutes, thesystem will only trace transactions for ten minutes. At the end of theten minute period, new transactions that are started will not be traced.However, transactions that have already started during the ten minuteperiod will continue to be traced. In other embodiments, at the end ofthe session length, all tracing will cease regardless of when thetransaction started. Other configuration data can also includespecifying one or more userIDs, a flag set by an external process orother data of interest to the user. For example, the userID is used tospecify that only transactions initiated by processes associated with aparticular one or more userIDs will be traced. The flag is used so thatan external process can set a flag for certain transactions, and onlythose transactions that have the flag set will be traced. Otherparameters can also be used to identify which transactions to trace. Theinformation provided in step 204 can be used to create a filter.

In other embodiments as will be more fully described hereinafter,variations to the trace period are utilized. A user may specify athreshold execution time for a type of transaction. A user may specify athreshold deviation from a normal execution time and capture faster ormore slowly executing transactions. Transactions exceeding thecorresponding threshold will be reported. In one embodiment, a user doesnot provide a threshold execution time, deviation, or trace period fortransactions being traced. Rather, the application performancemanagement tool intelligently determines the threshold(s). For example,the tool can average execution times of transactions of a particulartype to determine a corresponding threshold execution time. Thethreshold time can be a static value or a dynamic value that is updatedas more transaction data is collected. The threshold may be a runningaverage based on a number of previous transactions. Other moresophisticated time series techniques may also be used as will bedescribed hereinafter.

In step 206 of FIG. 3, the workstation adds the new filter to a list offilters on the workstation. In step 208, the workstation requestsenterprise manager 120 to start the trace using the new filter. In step210, enterprise manager 120 adds the filter received from theworkstation to a list of filters. For each filter in its list,enterprise manager 120 stores an identification of the workstation thatrequested the filter, the details of the filter (described above), andthe agents to which the filter applies. In one embodiment, if theworkstation does not specify the agents to which the filter applies,then the filter will apply to all agents. In step 212, enterprisemanager 120 requests the appropriate agents to perform the trace. Instep 214, the appropriate agents perform the trace. In step 216, theagents performing the trace send data to enterprise manager 120. Moreinformation about steps 214 and 216 will be provided below. In step 218,enterprise manager 120 matches the received data to the appropriateworkstation/filter/agent entry. In step 220, enterprise manager 120forwards the data to the appropriate workstation(s) based on thematching in step 218. In step 222, the appropriate workstations reportthe data. In one embodiment, the workstation can report the data bywriting information to a text file, to a relational database, or otherdata container. In another embodiment, a workstation can report the databy displaying the data in a GUI. More information about how data isreported is provided below.

As noted above, agents perform tracing for transactions. To perform suchtracing, the agents can leverage what is called Blame Technology in oneembodiment. Blame Technology works in a managed Java application toenable the identification of component interactions and componentresource usage. Blame Technology tracks components that are specified toit. Blame Technology uses the concepts of consumers and resources.Consumers request some activity while resources perform the activity. Acomponent can be both a consumer and a resource, depending on thecontext.

When reporting about transactions, the word Called designates aresource. This resource is a resource (or a sub-resource) of the parentcomponent, which is the consumer. For example, under the consumerServlet A (see below), there may be a sub-resource Called EJB. Consumersand resources can be reported in a tree-like manner. Data for atransaction can also be stored according to the tree. For example, if aServlet (e.g. Servlet A) is a consumer of a network socket (e.g. SocketC) and is also a consumer of an EJB (e.g. EJB B), which is a consumer ofa JDBC (e.g. JDBC D), the tree might look something like the following:

Servlet A   Data for Servlet A     Called EJB B     Data for EJB B      Called JDBC D         Data for JDBC D   Called Socket C     Datafor Socket C

In one embodiment, the above tree is stored by the agent in a stack.This stack is called the Blame Stack. When transactions are started,they are pushed onto the stack. When transactions are completed, theyare popped off the stack. In one embodiment, each transaction on thestack has the following information stored: type of transaction, a nameused by the system for that transaction, a hash map of parameters, atimestamp for when the transaction was pushed onto the stack, andsub-elements. Sub-elements are Blame Stack entries for other components(e.g. methods, process, procedure, function, thread, set ofinstructions, etc.) that are started from within the transaction ofinterest. Using the tree as an example above, the Blame Stack entry forServlet A would have two sub-elements. The first sub-element would be anentry for EJB B and the second sub-element would be an entry for SocketSpace C. Even though a sub-element is part of an entry for a particulartransaction, the sub-element will also have its own Blame Stack entry.As the tree above notes, EJB B is a sub-element of Servlet A and alsohas its own entry. The top (or initial) entry (e.g., Servlet A ) for atransaction, is called the root component. Each of the entries on thestack is an object. While the embodiment described herein includes theuse of Blame Technology and a stack, other embodiments can use differenttypes of stacks, different types of data structures, or other means forstoring information about transactions.

FIG. 4 is a flowchart describing one embodiment of a process forstarting the tracing of a transaction. The steps of FIG. 4 are performedby the appropriate agent(s). In step 302, a transaction starts. In oneembodiment, the process is triggered by the start of a method asdescribed above (e.g. the calling of the “loadTracer” method). In step304, the agent acquires the desired parameter information. In oneembodiment, a user can configure which parameter information is to beacquired via a configuration file or the GUI. The acquired parametersare stored in a hash map, which is part of the object pushed onto theBlame Stack. In other embodiments, the identification of parameters arepre-configured. There are many different parameters that can be stored.In one embodiment, the actual list of parameters used is dependent onthe application being monitored. The present disclosure is not limitedto any particular set of parameters. Table 1 provides examples of someparameters that can be used.

TABLE 1 Parameters Appears in Value UserID Servlet, JSP The UserID ofthe end-user invoking the http servlet request. URL Servlet, JSP The URLpassed through to the servlet or JSP, not including the Query String.URL Query Servlet, JSP The portion of the URL that specifies queryparameters in the http request (text that follows the ‘?’ delimiter).Dynamic SQL Dynamic JDBC The dynamic SQL statement, either in aStatements generalized form or with all the specific parameters from thecurrent invocation. Method Blamed Method The name of the traced method.If the timers (everything traced method directly calls another butServlets, JSP's method within the same component, and JDBC only the“outermost” first encountered Statements) method is captured. CallableSQL Callable JDBC The callable SQL statement, either in a statementsgeneralized form or with all the specific parameters from the currentinvocation. Prepared SQL Prepared JDBC The prepared SQL statement,either in a statements generalized form or with all the specificparameters from the current invocation. Object All non-static toString() of the this object of the traced methods component, truncated to someupper limit of characters. Class Name All Fully qualified name of theclass of the traced component. Param_n All objects with toString( ) ofthe nth parameter passed to WithParams the traced method of thecomponent. custom tracers Primary Key Entity Beans toString( ) of theentity bean's property key, truncated to some upper limit of characters.

In step 306, the system acquires a timestamp indicating the currenttime. In step 308, a stack entry is created. In step 310, the stackentry is pushed onto the Blame Stack. In one embodiment, the timestampis added as part of step 310. The process of FIG. 4 is performed when atransaction is started. A process similar to that of FIG. 4 is performedwhen a component of the transaction starts (e.g. EJB B is a component ofServlet A—see tree described above).

FIG. 5 is a flowchart describing one embodiment of a process forconcluding the tracing of a transaction. The process of FIG. 5 can beperformed by an agent when a transaction ends. In step 340, the processis triggered by a transaction (e.g. method) ending as described above(e.g. calling of the method “finishTrace”). In step 342, the systemacquires the current time. In step 344, the stack entry is removed. Instep 346, the execution time of the transaction is calculated bycomparing the timestamp from step 342 to the timestamp stored in thestack entry. In step 348, the filter for the trace is applied. Forexample, the filter may include a threshold execution time of onesecond. Thus, step 348, would include determining whether the calculatedduration from step 346 is greater than one second. In anotherembodiment, a normal value for the type of transaction is used with athreshold deviation. If the transaction's execution time deviates fromthe normal value by more than threshold amount, the threshold isdetermined to be exceeded. If the threshold is not exceeded (step 350),then the data for the transaction is discarded. In one embodiment, theentire stack entry is discarded. In another embodiment, only theparameters and timestamps are discarded. In other embodiments, varioussubsets of data can be discarded. In some embodiments, if the thresholdperiod is not exceeded then the data is not transmitted by the agent toother components in the system of FIG. 2. If the duration exceeds thethreshold (step 350), then the agent builds component data in step 360.Component data is the data about the transaction that will be reported.In one embodiment, the component data includes the name of thetransaction, the type of the transaction, the start time of thetransaction, the duration of the transaction, a hash map of theparameters, and all of the sub-elements or components of the transaction(which can be a recursive list of elements). Other information can alsobe part of the component data. In step 362, the agent reports thecomponent data by sending the component data via the TCP/IP protocol toenterprise manager 120.

FIG. 5 represents what happens when a transaction finishes. When acomponent finishes, the steps can include getting a time stamp, removingthe stack entry for the component, and adding the completed sub-elementto previous stack entry. In one embodiment, the filters and decisionlogic are applied to the start and end of the transaction, rather thanto a specific component.

Note that in one embodiment, if the transaction tracer is off, thesystem will still use the Blame Stack; however, parameters will not bestored and no component data will be created. In some embodiments, thesystem defaults to starting with the tracing technology off. The tracingonly starts after a user requests it, as described above.

FIG. 6 provides one example of a graphical user interface that can beused for reporting transactions and components thereof, in accordancewith embodiments of the present disclosure. The GUI includes atransaction trace table 400 which lists all of the transactions thathave satisfied the filter (e.g. execution time beyond the threshold).Because the number of rows on the table may be bigger than the allottedspace, the transaction trace table 400 can scroll. Table 2, below,provides a description of each of the columns of transaction trace table400.

TABLE 2 Column Header Value Host Host that the traced Agent is runningon Process Agent Process name Agent Agent ID TimeStamp TimeStamp (inAgent's JVM's clock) of the (HH:MM:SS.DDD) initiation of the TraceInstance's root entry point Category Type of component being invoked atthe root level of the Trace Instance. This maps to the first segment ofthe component's relative blame stack: Examples include Servlets, JSP,EJB, JNDI, JDBC, etc. Name Name of the component being invoked. Thismaps to the last segment of the blamed component's metric path. (e.g.for “Servlets|MyServlet”, Category would be Servlets, and Name would beMyServlet). URL If the root level component is a Servlet or JSP, the URLpassed to the Servlet/JSP to invoke this Trace Instance. If theapplication server provides services to see the externally visible URL(which may differ from the converted URL passed to the Servlet/JSP) thenthe externally visible URL will be used in preference to the “standard”URL that would be seen in any J2EE Servlet or JSP. If the root levelcomponent is not a Servlet or JSP, no value is provided. Duration (ms)Execution time of the root level component in the Transaction Trace dataUserID If the root level component is a Servlet or JSP, and the Agentcan successfully detect UserID's in the managed application, the UserIDassociated with the JSP or Servlet's invocation. If there is no UserID,or the UserID cannot be detected, or the root level component is not aServlet or JSP, then there will be no value placed in this column.

Each transaction that has an execution time beyond a threshold willappear in the transaction trace table 400. The user can select any ofthe transactions in the transaction trace table by clicking with themouse or using a different means for selecting a row. When a transactionis selected, detailed information about that transaction will bedisplayed in transaction snapshot 402 and snapshot header 404.

Transaction snapshot 402 provides information about which transactionalcomponents are called and for how long. Transaction snapshot 402includes views (see the rectangles) for various components, which willbe discussed below. If the user positions a mouse (or other pointer)over any of the views, mouse-over info box 406 is provided. Mouse-overinfo box 406 indicates the following information for a component:name/type, duration, timestamp and percentage of the transaction timethat the component was executing. More information about transactionsnapshot 402 will be explained below. Transaction snapshot header 404includes identification of the agent providing the selected transaction,the timestamp of when that transaction was initiated, and the duration.Transaction snapshot header 404 also includes a slider to zoom in orzoom out the level of detail of the timing information in transactionsnapshot 402. The zooming can be done in real time.

In addition to the transaction snapshot, the GUI will also provideadditional information about any of the transactions within thetransaction snapshot 402. If the user selects any of the transactions(e.g., by clicking on a view), detailed information about thattransaction is provided in regions 408, 410, and 412 of the GUI. Region408 provides component information, including the type of component, thename the system has given to that component and a path to thatcomponent. Region 410 provides analysis of that component, including theduration the component was executing, a timestamp for when thatcomponent started relative to the start of the entire transaction, andan indication of the percentage of the transaction time that thecomponent was executing. Region 412 includes indication of anyproperties. These properties are one or more of the parameters that arestored in the Blame Stack, as discussed above.

The GUI also includes a status bar 414. The status bar includes anindication 416 of how many transactions are in the transaction tracetable, an indication 418 of how much time is left for tracing based onthe session length, stop button 420, and restart button 422.

FIG. 7 depicts transaction snapshot 402. Along the top of snapshot 402is time axis 450. In one embodiment, the time axis is in milliseconds.The granularity of the time access is determined by the zoom slider insnapshot header 404. Below the time axis is a graphical display of thevarious components of a transaction. The visualization includes a set ofrows 454, 456, 458, and 460 along an axis indicating the call stackposition. Each row corresponds to a level of components. The top rowpertains to the root component 470. Within each row is one or more boxeswhich identify the components. In one embodiment, the identificationincludes indication of the category (which is the type of component—JSP,EJB, servlets, JDBC, etc.) and a name given to the component by thesystem. The root level component is identified by box 470 asJSP|Account. In the transaction snapshot, this root level componentstarts at time zero. The start time for the root level component is thestart time for the transaction and the transaction ends when the rootlevel component JSP|Account 470 completes. In the present case, the rootlevel component completes in approximately 3800 milliseconds. Each ofthe levels below the root level 470 includes components called by theprevious level. For example, the method identified by JSP/Account maycall a servlet called CustomerLookup. Servlet|CustomerLookup is calledjust after the start of JSP|Account 470 and Servlet|CustomerLookup 472terminates approximately just less than 3500 milliseconds.Servlets|CustomerLookup 472 calls EJB|Entity|Customer 474 atapproximately 200 milliseconds. EJB|entity customer 474 terminates atapproximately 2400 milliseconds, at which time Servlet|CustomerLookup472 calls EJB|Session|Account 476. EJB|session account 647626 is startedat approximately 2400 milliseconds and terminates at approximately 3400milliseconds. EJB|EntityCustomer 474 calls JDBC|Oracle|Query 480 atapproximately 250 milliseconds. JDBC|Oracle|Query 480 concludes atapproximately 1000 milliseconds, at which time EJB|Entity|Customer 474calls JDBC|Oracle|Update 482 (which itself ends at approximately 2300milliseconds). EJB/Session/Account 476 calls JDBC|Oracle/Query 484,which terminates at approximately 3400 milliseconds. Thus, snapshot 402provides a graphical way of displaying which components call whichcomponents. Snapshot 402 also shows for how long each component wasexecuting. Thus, if the execution of JSP|Account 470 took too long, thegraphical view of snapshot 402 will allow user to see which of thesubcomponents is to blame for the long execution of JSP account 470.

The transaction snapshot provides for the visualization of time fromleft to right and the visualization of the call stack top to bottom.Clicking on any view allows the user to see more details about theselected component. A user can easily see the run or execution time of aparticular component that may be causing a transaction to run tooslowly. If a transaction is too slow, it is likely that one of thecomponents is running significantly longer than it should be. The usercan see the execution times of each component and attempt to debug thatparticular component.

In one embodiment, the application performance management toolautomatically identifies and reports one or more components that may beexecuting too slowly. The identification and reporting is performedwithout user intervention in one embodiment. Moreover, normal executiontimes for transactions and components can be dynamically andautomatically generated.

Transactions are identified and component data reported, such as throughthe GUI depicted in FIGS. 6 and 7, to enable end-users to diagnose theroot cause of a performance problem associated with a particulartransaction. To further facilitate the management of applicationperformance, the root cause of a performance problem is programmaticallydiagnosed in accordance with one embodiment. The diagnosis isimplemented in one embodiment by analyzing the component data for aselected transaction. After analysis, one or more components areidentified as a potential cause of the application's performanceproblem. These components can be reported to the end-user as anautomatic diagnosis of the cause of an identified performance problem.Such implementations enable abnormally performing components oftransactions to be programmatically identified and reported without userintervention. By eliminating required human analysis of raw componentdata, designers, managers, and administrators can more quickly,efficiently, and reliably identify poorly performing components.

FIG. 8 is a table depicting exemplary component data for fourtransactions of the same transaction type. The individual tasksperformed for the illustrated transaction type are set forth in column502. In a Java environment for example, each task may be a set(s) ofcode that is instantiated and executed for the associated component ofeach transaction. The transaction component refers to an instance of thecode for the task that is executed during a particular transaction insuch an implementation. In some embodiments, however, different sets ofcode can be used or instantiated to perform the same task for differenttransactions of the same type.

Data for each component of individual transactions that perform eachtask is set forth in each corresponding row. Transactions 1, 2, 3, and 4each include a component for performing each of the identified tasks.Typically, each component of the transactions that perform the same taskare of the same component type. Column 504 sets forth the data fortransaction 1, column 506 sets forth the data for transaction 2, column508 sets forth the data for transaction 3, and column 510 sets forth thedata for transaction 4. By way of example, transaction 1 includes afirst component that performs the task JSP|Account and has an executiontime of 3825 ms. Transaction 1 further includes a second componenthaving an execution time of 3450 ms for performing the taskServlet|CustomerLookup, a third component having an execution time of2225 ms for performing the task EJB|Entity|Customer, a fourth componenthaving an execution time of 990 ms for performing the taskEJB|Session|Account, a fifth component having an execution time of 755ms for performing the task JDBC|Oracle|Query, a sixth component havingan execution time of 1310 ms for performing the task JDBC|Oracle|Update,and a seventh component having an execution time of 700 ms forperforming the task JDBC|Oracle|Query a second time. Transactions 2, 3,and 4 also have components for performing each transaction.

Together, the execution times of each transactional component associatedwith a particular task forms time series data for that task. Time seriesanalytical techniques can be used on this data to determine if acomponent of a transaction performs abnormally. For example, afterdetermining that a particular transaction has an execution time outsidea threshold, the time series data can be used to identify one or morecomponents of the transaction that may be causing the performanceproblem.

Column 512 sets forth a normal execution time associated with each task.In one embodiment, the normal execution time for each task is determinedby averaging the execution times of each transaction component whenperforming that task. The normal execution time is a static value in oneembodiment that is determined from past component executions prior tobeginning transaction tracing. In another embodiment, the normalexecution time is a dynamic value. For example, the normal executiontime can be recalculated after every N transactions using the componentexecution times for the last N transactions. More sophisticated timeseries analytical techniques are used in other embodiments. For example,determining a normal execution time for a task can include identifyingtrends and seasonal variations in the time series data to predict anormal value for the task's execution time. Holt's Linear ExponentialSmoothing is employed in one embodiment to determine a normal executiontime for a transaction. Holt's Linear Exponential Smoothing is a knowntechnique that combines weighted averaging and trend identification in acomputationally low-cost manner. This technique is very suitable forreal-time updates to determine a normal value for task execution time.

Column 514 sets forth a threshold for each task. If the times seriesdata for a component deviates from the normal execution time for theassociated task by more than the threshold, the component is identifiedas a potential cause of a performance problem. These components can bereported when diagnosing the root cause of an identified transactionalperformance problem. In one embodiment, threshold deviations are appliedso as to only identify components having an execution time that exceedsthe normal value by more than the threshold. In other embodiments, ifthe execution time is below the normal value by more than the threshold,the component can be identified. In yet another embodiment, a thresholdexecution time is applied directly to the component rather than athreshold deviation.

Row 516 sets forth the total execution time of each transaction as wellas a normal execution time and threshold. The total transaction time isequal to the execution time of each component of the transaction. Thenormal value can be calculated as previously described. Simple averagingof a number of transaction execution times or more sophisticatedtime-series techniques applied. The threshold can also be calculated aspreviously described. Static or dynamic threshold values can be used.The threshold can be expressed as a threshold execution time for thetransaction or a threshold deviation from a normal value for the type oftransaction.

The total transaction time can be compared to the normal value using thethreshold deviation (or compared directly to a threshold transactiontime). Those transactions having a total execution time beyond thethreshold can be identified and reported, for example, as shown in FIGS.6 and 7. For the reported transactions, the component data can beexamined to determine if there were any abnormalities. For example,transaction 3 has a total execution time of 13,275 ms. This transactiontime is beyond the threshold execution time so the transaction isreported. The JSP|Account component had an execution time of 3900 ms,which deviated from the normal value by more than the threshold. Thiscomponent can be reported for transaction 3. In some embodiments, onlytransactions having an execution time over the normal value by thethreshold are reported. In one embodiment, if a transaction has anexecution time above the normal value by more than the threshold, onlycomponents having execution times that are above their correspondingnormal value are reported. That is, components that have an executiontime below the normal by more than their threshold will not be reported.In other embodiments, components with execution times below their normalby more than the threshold amount can be reported as well. Fortransactions having execution times below the normal by more than thethreshold, components above and/or below their normal values by morethan the threshold can be reported as well.

In FIG. 9, an embodiment is depicted whereby component data is used toformulate time series data according to the systems involved in the typeof transaction. In implementations where each component is directlyassociated with a particular system, system-level time series data maycorrespond directly to task-level time series data. In otherimplementations, such as where transactional components for the sametask may execute on different systems in different transactions, suchcorrespondence may not exist and the time series data will be different.Data for multiple tasks may also be grouped by system to consolidatedata.

FIG. 9 depicts time series data for a set of web-based transactionsinvolving a browser, network, web server, identity server, applicationserver, database server, messaging server, and CICS server. Theindividual systems are listed in column 520. Common web-basedtransactions represented by the example in FIG. 9 could include aninitial browser request issued over the network to the web server tocomplete a purchase, request information, etc. The web server calls theidentity server to authenticate the user and then calls the applicationserver to complete the transaction. The application server issues a callto the database server, messaging server, and CICS server to perform thetransaction. The application server then returns a result to the webserver, which in turn responds to the browser over the network.

Columns 522, 524, 526, and 528 list the execution times at each systemby individual components of transactions 1, 2, 3, and 4, respectively.Each entry for a transaction may correspond to the execution time of oneor more components of the transactions that are associated with theidentified system. By way of example, transaction 1 includes executiontimes of 9.8 ms for the browser component(s), 99.8 ms for the networkcomponent(s), 9.9 ms for the web server component(s), 198 ms for theidentity server component(s), another 10.1 ms for the web servercomponent(s), 51 ms for the application server component(s), 98ms forthe database server component(s), 49.5 for the application servercomponent(s), 101 ms for the messaging server component(s), 21 ms forthe application server component(s), 200 for the CICS servercomponent(s), 29.5 ms for the application server component(s), 10.1 msfor the web server component(s), 10.1 ms for the web servercomponent(s), 99.8 ms for the network server component(s), and 10.3 msfor the browser component(s). Particular systems are listed more thanonce for the transactions to represent that these systems are involvedin the transaction at multiple points. Different components of thetransactions may be invoked to perform different tasks at the systemsduring these different points of the transactions.

Normal execution times are depicted in column 530 for each system duringeach individual part of the transaction. Like the values depicted inFIG. 8, the normal execution times can be static or dynamic values.Different analysis techniques including simple averaging, Holt's LinearExponential smoothing, and more can be used to calculate the normalvalues as before. Threshold deviations from the normal values are setforth in column 532. In the system-based technique of FIG. 9, systemscan be identified and reported when their execution time for atransaction is detected as having deviated from its corresponding normalvalue by the threshold amount or more. Again, deviations above and/orbelow normal can be used to identify systems, as well as thresholdexecution times.

Row 534 sets forth the total execution time for each transaction basedon the execution time of each system involved in the transaction. Anormal transaction time and threshold are set forth in columns 530 and532 for the overall transaction. In FIG. 9, transaction 3 has exceededthe normal execution time by more than the threshold. The componentscorresponding to the database server are beyond the database servernormal value by more than the corresponding threshold and can bereported as a potential cause of the performance problem associated withtransaction 3.

Another set of time series data for a set of transactions is depicted inFIG. 10. The set of transactions depicted in FIG. 10 are similar to theset of transactions in FIG. 9. However, the execution times for eachindividual system have been grouped together and the raw execution timesconverted into percentages of total transaction time. Column 550 liststhe systems involved in the transactions. Each system's total percentageof transaction time for transactions 1, 2, 3, and 4 is set forth incolumns 552, 554, 556, and 558, respectively. For transaction 1, thebrowser makes up 2.0% of the total transaction time, the network makesup 20.0% of the total transaction time, the web server makes up 3.0% ofthe total transaction time, the identity server makes up 20.0% of thetotal transaction time, the application server makes up 15.0% of thetotal transaction time, the database server makes up 10.0% of the totaltransaction time, and the messaging server makes up 10.0% of the totalexecution time. Normal values for each system's total transaction timeare set forth in column 560 as a percentage of total transaction time.Threshold deviations from the normal percentage values are listed incolumn 562. In this embodiment, a system can be identified and reportedwhen its percentage of total execution time for a transaction deviatesfrom the normal for the transaction type by more than the threshold.Again, deviations above and/or below the normal value can be detected invarious embodiments. Direct threshold percentages can also be used.

Row 564 sets forth the total execution time for each transaction basedon the execution time of each system involved in the transaction. Anormal transaction time and threshold are set forth in columns 560 and562 for the overall transaction. While percentages are used for theindividual component values, actual time values are used for determiningif a transaction is beyond a threshold execution time value. In FIG. 10,transactions 3 and 4 have total execution times beyond the threshold.These transactions will be reported. The application server is reportedas a possible cause of the performance problem with transaction 3 andthe network is reported as a possible cause of the performance problemwith transaction 4.

FIG. 11 is a flowchart of one embodiment for tracing transactions andproviding programmatic root cause analysis of detected performanceproblems. At step 600, the various agents implemented in thetransactional system acquire data. Agents may acquire data directly fromtransaction components running on the same system. Agents may acquiredata from other components, such as browsers, external database servers,etc. by monitoring response times and/or installing code such asJavascript to monitor and report execution times. An agent thatinitiates tracing, for example, may add a script to a web page tomonitor the execution time of a browser in performing a transaction. Atstep 602, the various agents report data to the enterprise manager.

In one embodiment, the agent(s) continuously acquire data for thevarious metrics they are monitoring. Thus, step 600 may be performed inparallel to the other steps of FIG. 11. Each agent can be configured toreport data to the enterprise manager at step 602. For example, theagents may report data every 7.5 seconds or every 15 seconds. Thereported data may be data for one or more transactions. In oneembodiment, the agent(s) will sample data for a particular transactionat every interval. In one embodiment, an agent associated with acomponent that receives an initial request starting a transaction willoperate as an entry point agent. The entry point agent can modify therequest header (e.g., by adding a flag) to indicate to other agents inthe system to report data for the corresponding transaction. When theother agents receive the header with the flag, they will report themonitored data for the corresponding transaction to the enterprisemanager 120.

The enterprise manager can be configured to wake-up and process data ata specified interval. For example, the enterprise manager can wake-upevery 15 seconds and process the data from the agents reported duringtwo 7.5 second intervals. This data may be appended to a spool file orquery file at step 602. More information regarding the collection ofdata by the agents and processing by the enterprise manager can be foundin U.S. patent application Ser. No. 11/033,589, entitled “EfficientProcessing of Time Series Data,” incorporated herein by reference in itsentirety.

The enterprise manager formulates time series data for the variouscomponents of the monitored transactions at step 604. The enterprisemanager can create a data structure such as those depicted in FIGS. 8,9, and 10 in one embodiment, although other data structures can be used.The enterprise manager can formulate time series data by task asdepicted in FIG. 8, or by system as depicted in FIGS. 9 and 10.

The method depicted in FIG. 11 can be performed for each transactionbeing monitored. As such, step 604 can include appending component datafor the selected transaction to previously collected data. At step 606,the enterprise manager determines if the total transaction time exceededa threshold. Step 606 can include comparing the total transaction timeto a threshold time or determining whether the total time deviated froma normal transaction time by more than a threshold value. If the totaltransaction time did not exceed the corresponding threshold, tracing forthe transaction completes at step 608.

If the total transaction time exceeds the threshold, component data forthe transaction data is identified at step 610. The component data canbe maintained by individual tasks with which the transactionalcomponents are associated as shown in FIG. 8, or by system as shown inFIGS. 9 and 10. At step 612, the enterprise manager determines if afirst component of the transaction exceeded the threshold for theassociated task or system. The enterprise manager determines if thecomponent execution time deviated from a normal value for the task orsystem by more than a threshold in one embodiment. In anotherembodiment, the component execution time (or percentage) is compared toa threshold execution time. If the component has exceeded the relevantthreshold, the component is identified as a potential cause of atransaction performance problem at step 614.

After identifying the component or determining that it did not exceedits threshold, the enterprise manager determines at step 616 whetherthere are additional components of the transaction to analyze. Ifadditional components remain, the method proceeds to step 612 where theenterprise manager examines the execution time of the next component.After analyzing each component of the transaction, the enterprisemanager reports the identified components at step 618. Step 618 caninclude making an indication in the graphical user interface depicted inFIGS. 6 and 7. The identified components can be highlighted intransaction snapshot window 402 for example. Other indications can beused as well.

Thresholds for analyzing transaction execution times are dynamicallyupdated using time-series analysis techniques in one embodiment. Theseanalysis techniques can be performed in real-time for each transactiontype. FIG. 12 is a flowchart depicting one technique for providingdynamic thresholds in one embodiment. FIG. 12 can be performed as partof step 606 in FIG. 11 in one embodiment. At step 702, the enterprisemanager determines if the threshold for the particular type oftransaction is to be updated. The enterprise manager may be configuredto update the threshold for a particular type of transaction afterreceiving data for a certain number of transactions of that type. Othertechniques may be employed to determine when to update a threshold for atype of transaction. If the threshold is to be updated, the enterprisemanager identifies the execution time of the last N transactions forwhich the manager received data. The actual number of transactions canvary by implementation. A new threshold for the type of transaction isdeveloped at step 706. In one embodiment, step 706 includes determininga normal value for the execution time of the particular type oftransaction. The threshold can then be set to a time at a certain levelor percentage (variable) above and/or below the normal value. Thethreshold may also be expressed as a threshold deviation from the normaltime (above and/or below). Thus, step 706 can include determining a newnormal time for the transaction type and/or a new threshold to beapplied. After developing the new threshold and/or normal value for thetransaction type, the new values are applied for the particulartransaction being analyzed at step 708.

The thresholds used when analyzing the individual components oftransactions can also be updated dynamically. FIG. 13 is a flowchartdepicting one method for dynamically updating a threshold for analyzingtransactional components as the possible root cause of performanceproblems. In one embodiment, FIG. 13 is performed at step 614 whenanalyzing a component execution time for a transaction. At step 720, theenterprise manager determines if the task or system thresholdinformation corresponding to the type of component being analyzed is tobe updated. The enterprise manager updates threshold information afterreceiving data for a particular number of transactions that include acomponent associated with the particular task and/or system in oneembodiment. Other update periods can be used. If the thresholdinformation is not to be updated, the component is analyzed using theexisting threshold and/or normal data for the particular task.

If the threshold data is to be updated, the enterprise manageridentifies the execution times of components associated with theparticular task during the last N transactions at step 722. The numberof transactions can vary by embodiment and particularly, on the type ofanalysis techniques to be employed at step 724. A normal value for theparticular task is determined at step 724 using the identified data. Inone embodiment, the last N execution times are averaged. In otherembodiments, trends and seasonal variations can be identified to predicta new normal value. Holt's Linear Exponential Smoothing is used in oneimplementation to combine weighted averaging and trend identification ina low-cost way for a real-time update of the normal value. At step 756,the enterprise manager determines whether the threshold for the task isto be updated. In some embodiments, a threshold is used that isexpressed as a deviation from normal. This value can remain the sameregardless of the normal value determined at step 724. In otherembodiments, the threshold deviation is changed as well. If thethreshold is to be updated, the enterprise manager updates the necessaryvalue at step 728. A new threshold deviation can be selected or a newthreshold execution time selected. At step 730, the new thresholddeviation and/or normal execution time is applied to analyze theparticular component.

FIG. 14 is a flowchart describing one embodiment of a process forreporting data in the transaction trace table 400. The process of FIG.14 is performed by a workstation in one embodiment. In step 800, theworkstation receives transaction information from enterprise manager120. In step 802, the data is stored. In step 804, the data is added tothe transaction table as a new row on table 400.

FIG. 15 is a flowchart describing one embodiment of a process fordisplaying a transaction snapshot. In step 820, the GUI receives aselection of a transaction. That is, the user selects one of the rows oftransaction trace table 400. Each row of transaction trace table 400represents data for one particular transaction. The user can select atransaction by clicking on the row. In other embodiments, other meanscan be used for selecting a particular transaction. In step 822, thedata stored for that selected transaction is accessed. In step 824, theaxis for the transaction snapshot is set up. In one embodiment, thesystem renders the time axis along the X axis. For example, in theembodiment depicted in FIG. 6, the time axis is from zero ms to 6000 ms.The zoom slider in snapshot header 404 (see FIG. 6) is used to changethe time axis. In some embodiments, configuration files can be used tochange the time. In one embodiment, the actual lime representing theaxis for call stack position is not rendered. However, the axis is usedas described herein. In step 826, the view for the root component isdrawn. For example, in transaction snapshot 402, the view for“JSP|Account” is drawn. In step 828, views for each of the components ofthe root component are drawn. Additionally, the system recursively drawsviews for each component of each higher level component. For example,looking at FIG. 6, the first root component JSP|Account is drawn. Then,the components of the root component are drawn (e.g.,“Servlets|CustomerLookup” is drawn). Then, recursively for eachcomponent, a view is drawn. First, a view is drawn forEJB|Entity|Customer, then the components of EJB|Entity|Customer aredrawn (e.g. JDBC|Oracle|Query and JDBC|Oracle|Update). After thecomponents for EJB|Entity|Customer are drawn, the view forEJB|Session|Account is drawn, followed by the componentJDBC|Oracle|Query.

FIG. 16 is a flowchart describing one embodiment of a process fordrawing a view for a particular component. In step 850, the relativestart time is determined. In one embodiment, if the view is the rootcomponent the start time is at 0 ms. If the view is not from the rootcomponent, then the timestamp of the start of the component is comparedto the timestamp of the start of the root component. The differencebetween the two timestamps is the start time for the component beingrendered. In step 852, the relative stop time is determined. Byrelative, it is meant relative to the root component. Thus, the stoptime is determined for the component being rendered. The stop time ofthe component being rendered is compared to the stop time of the rootcomponent. The difference in the actual stop time of the root componentas compared to the actual stop time of the component under considerationis subtracted from the stop time of the root component in thetransaction snapshot 402. In step 854, the X values (time axis) of thestart and end of the rectangle for the view are determined based on therelative start time, relative stop time, and the zoom factor. Based onknowing the relative start time, the relative stop time, and the extentof the zoom slider, the exact coordinate of the beginning of therectangle and the end of the rectangle can be determined. In step 856,the Y values (call stack position axis) of the top and bottom of therectangle are determined based on the level of the component. That is,the Y values of all of the rectangles are predetermined based on whetherit is the root component, the first component thereof, secondsubcomponent, third subcomponent, etc. In step 858, the view is added tothe transaction snapshot. In step 860, an additional view box for thecalling component is also added. The calling component is a componentthat invokes the component being drawn. For example, in the transactionsnapshot of 402, the calling component of Servlets|CustomerLookup isJSP|Account. At step 862, the view for the component in transactionsnapshot 402 is highlighted if the component data indicates that thecomponent exceeded its relevant threshold. Step 862 is optional. Inother embodiments, different indications can be made in transactionsnapshot 402 for components that exceed a threshold during thetransaction.

FIG. 17 is a flowchart describing one embodiment of a process forreporting detailed information about a component of the transaction.That is, when the user selects one of the components in transactionsnapshot 402, detailed information is provided for that component incomponent information region 408, analysis region 410 and propertiesregion 412. In step 870, the GUI receives the user's selection of acomponent. In step 872, the stored data for the chosen component isaccessed. In step 874, the appropriate information is added to componentinformation region 408. That is, the stored data is accessed andinformation indicating the type of component, the name of the component,and the path to the component are accessed and reported. Each of thesedata values are depicted in component information region 408. In step876, data is added to the analysis region 410. That is, system accessesthe stored duration (or calculates the duration), the timestamp, thestart of the component relative to the start of the root component, anddetermines the percentage of transaction time used by that component.These values are displayed in the analysis region 410. The percentage oftransaction times is calculated by dividing the duration of the selectedcomponent by the duration of the root component and multiplying by 100%.Step 876 can include providing an indication if the component exceededits relevant threshold. In step 878, data is added to the propertiesregion. In one embodiment, the properties region will display the methodinvoked for the component. In other embodiments, other additionalparameters can also be displayed. In one embodiment, regions 408, 410,and 412 are configurable to the display whatever the user configures itto display.

The user interface of FIG. 8 also includes a set of drop down menus. Oneof these menus can be used to allow the user to request a text file tobe created. In response to the request by the user, the system willwrite all (or a configurable subset) of the information that is and/orcan be displayed by the graphical user interface into a text file. Forexample, a text file can include the category, component name,timestamp, duration, percentage of the transaction time, URL, userID,host, process, agent, all of the called subcomponents and similar datafor the called subcomponents. Any and all of the data described abovecan be added to the text file.

The above discussion contemplates that the filter used by the agent todetermine whether to report a transaction is based on execution time. Inother embodiments, other tests can be used. Examples of other testsinclude choosing based on UserID, provide a random sample, report anytransaction whose execution time varies by a standard deviation, etc.

The foregoing detailed description has been presented for purposes ofillustration and description. It is not intended to be exhaustive or tolimit the invention to the precise form disclosed. Many modificationsand variations are possible in light of the above teaching. Thedescribed embodiments were chosen in order to best explain theprinciples of the invention and its practical application to therebyenable others skilled in the art to best utilize the invention invarious embodiments and with various modifications as are suited to theparticular use contemplated. It is intended that the scope of theinvention be defined by the claims appended hereto.

1. A method of processing data, comprising: collecting data about a setof transactions, each transaction including a plurality of componentsassociated with a plurality of tasks, said data including time seriesdata for each task based on execution times of components associatedwith said each task during said set of transactions; determining whethersaid transactions have execution time values beyond a threshold; foreach transaction having an execution time value beyond said threshold,automatically identifying one or more components based on a deviation intime series data for a task that is associated with said one or morecomponents of said each transaction; and reporting said one or morecomponents for said each transaction.
 2. The method of claim 1, whereincollecting data about said set of transactions includes, for eachtransaction of said set: determining a total execution time; anddetermining an execution time of each component of said eachtransaction.
 3. The method of claim 2, wherein: collecting data aboutsaid set of transactions further includes determining a percentage oftotal execution time associated with each component for each transactionof said set; and said time series data for each task includes apercentage of total execution time of an associated component for eachtransaction of said set.
 4. The method of claim 3, further comprising:determining a normal percentage of total execution time for each taskbased on said percentage of total execution time of componentsassociated with said each task.
 5. The method of claim 4, whereinidentifying one or more components includes determining whether adeviation in times series data for each task exceeds a thresholddeviation; said threshold deviation for each task is a thresholddeviation from said normal percentage of total execution time for saideach task.
 6. The method of claim 5, wherein: said threshold deviationis a static value.
 7. The method of claim 2, wherein: said methodfurther comprises determining a normal execution time for each taskbased on execution times of components associated with said each task;said identifying one or more tasks includes determining whether adeviation in time series data for each of said tasks exceeds a thresholddeviation; and said threshold deviation for each task is a thresholddeviation from said normal execution time for each task.
 8. The methodof claim 1, wherein: said threshold is a static threshold execution timeassociated with said set of transactions.
 9. The method of claim 1,further comprising: determining said threshold dynamically based onexecution times associated with a number of transactions of said set.10. The method of claim 1, wherein: said method further comprises, foreach of said tasks, determining a threshold deviation dynamically basedon execution times of said components associated with said each taskduring a number of transactions of said set; said identifying one ormore components includes identifying said one or more components usingsaid threshold deviations.
 11. The method of claim 10, wherein:determining said threshold deviation for each task includes applyingweighted averaging and trend identification to said execution times ofsaid components.
 12. The method of claim 10, wherein: determining saidthreshold deviation for each task includes applying Holt's linearexponential smoothing to said execution times of said components. 13.The method of claim 1, wherein identifying one or more componentsincludes identifying said one or more components by comparing adeviation in said time series data for a particular task to a staticthreshold deviation.
 14. The method of claim 1, wherein: said timeseries data for each task is organized as time series data for a systemon which said each task is executed.
 15. The method of claim 1, whereinsaid time series data for each task is organized into combined timeseries data for multiple tasks that execute on said particular system.16. The method of claim 1, wherein reporting said one or more componentsfor each transaction having an execution time beyond said thresholdincludes: providing an indication that said one or more components are apotential cause for said each transaction having an execution timebeyond said threshold.
 17. The method of claim 16, wherein: said methodfurther comprises graphically displaying every component of eachtransaction having an execution time beyond said threshold; saidproviding an indication includes highlighting a graphical display ofsaid one or more components.
 18. The method of claim 1, wherein:reporting said one or more components comprises reporting less than allof said components for said each transaction.
 19. The method of claim 1,wherein: said set of transactions involves at least one application;said method further comprises accessing said at least one application;said method further comprises automatically modifying said at least oneapplication to add additional code for enabling said collecting dataabout said set of transactions.
 20. A method of monitoring software,comprising: monitoring sets of code, each set of code including aplurality of components; determining whether said sets of code satisfy afilter; for each set of code satisfying said filter, automaticallyidentifying one or more components thereof as a potential cause of saidset of code satisfying said filter; reporting each set of code thatsatisfies said filter, said automatically reporting includesautomatically reporting said one or more components for said each set ofcode as said potential cause.
 21. The method of claim 20, wherein:determining whether said sets of code satisfy a filter includesdetermining whether said sets of code have execution time values beyonda threshold.
 22. The method of claim 21, wherein monitoring said sets ofcode includes: collecting data about said sets of code as they areexecuted, said components are associated with a plurality of componenttypes, said data includes time series data for each component type basedon execution times of components associated with said each componentduring execution of said sets of code.
 23. The method of claim 22,wherein identifying one or more components includes, for each componentof each set of code that satisfies said filter: determining whether saidtime series data for a component type of said each component includes adeviation for said each set of code that is beyond a threshold for saidcomponent type.
 24. One or more processor readable storage deviceshaving processor readable code embodied on said processor readablestorage devices, said processor readable code for programming one ormore processors to perform a method comprising: monitoring sets of code,each set of code including a plurality of components, each component ofa different component type; monitoring said plurality of components foreach set of code to develop time series data for each component type,said time series data for each component type including data for acorresponding component of each set of code; determining whether saidsets of code have an execution time value beyond a threshold; for setsof code having an execution time value beyond said threshold,determining whether said time series data for each component thereof isoutside a component threshold for a corresponding component type; andreporting components having time series data outside said componentthreshold for their corresponding component type.
 25. One or moreprocessor readable storage devices according to claim 24, wherein saidmethod further comprises: dynamically determining a normal executiontime for each component type using corresponding time series data forsaid each component type; and dynamically determining said componentthreshold for each component type using corresponding time series datafor said each component type.
 26. One or more processor readable storagedevices according to claim 25, wherein: said component threshold foreach component type is a threshold deviation from said normal executiontime for said each component type.
 27. One or more processor readablestorage devices according to claim 25, wherein: said normal executiontime for each component type is a normal percentage of execution timefor said sets of code; and said threshold deviation for each componenttype is a threshold percentage deviation from said normal percentage ofexecution time for said sets of code.
 28. One or more processor readablestorage devices according to claim 24, wherein: said time series datafor each component type is organized according to a system on whichcomponents of said each component type execute.
 29. One or moreprocessor readable storage devices according to claim 24, wherein: eachcomponent type is associated with a particular task; said time seriesdata for each component type is organized according to said particulartask with which said component type is associated.
 30. One or moreprocessor readable storage devices according to claim 24, wherein: saidsets of code are transactions.
 31. One or more processor readablestorage devices according to claim 24, wherein said method furthercomprises: accessing at least one application, said at least oneapplication includes said sets of code; and automatically modifying saidat least one application to add additional code for enabling saidmonitoring of said sets of code and said monitoring of said plurality ofcomponents of each set of code.
 32. An apparatus for monitoringsoftware, comprising: one or more agents, said one or more agentscollect data about a set of transactions, each transaction including aplurality of components associated with a plurality of systems; and amanager in communication with said one or more agents, said managerperforms a method comprising: receiving said data about said set oftransactions from said one or more agents, developing time series datafor each of said systems based on execution times of componentsassociated with said each system during said set of transactions, foreach transaction having an execution time beyond a threshold,identifying one or more components based on a deviation in time seriesdata for a system that is associated with said one or more components ofsaid each transaction, and reporting said one or more components forsaid each transaction.
 33. An apparatus according to claim 32, wherein:reporting said one or more components for said each transactioncomprises reporting less than all of said components of said eachtransaction.
 34. An apparatus according to claim 32, wherein: eachtransaction of said set is of a same transaction type; said threshold isfor said same transaction type; said method further comprisesdynamically determining said threshold based on execution times for eachof said transactions; and said method further comprises dynamicallydetermining a component threshold for each system based on time seriesdata for said each system.
 35. An apparatus according to claim 32,wherein: said time series data for each of said systems is organizedaccording to a type of component associated with each of said systems.36. A method of processing data, comprising: collecting data about a setof transactions, each transaction including a plurality of components,each component associated with a component type, said data includingtime series data for each component type based on execution times ofcorresponding components of each transaction; dynamically determining athreshold value for each component type using said time series data forsaid each component type; comparing an execution time value of eachcomponent of each transaction with said threshold value for acorresponding component type; identifying components having an executiontime value beyond said threshold value for their corresponding componenttype; and automatically identifying and reporting components having anexecution time value beyond said threshold value for their correspondingcomponent type.
 37. The method of claim 36, wherein: dynamicallydetermining a threshold value for each component type includes applyingHolt's linear exponential smoothing to said time series data for eachcomponent type.
 38. The method of claim 36, wherein: said method furthercomprises dynamically determining a normal value for each componenttype; said comparing an execution time value of each component with saidthreshold value for a corresponding component type includes determiningwhether said execution time value of said each component deviates fromsaid normal value for said corresponding component type by more thansaid threshold value for said corresponding component type.